Convergent Fitted Value Iteration with Linear Function Approximation
نویسنده
چکیده
Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, “Expansion-Constrained Ordinary Least Squares” (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain the least squares regression operator to be a non-expansion in the∞-norm. We show that the space of function approximators that satisfy this constraint is more rich than the space of “averagers,” we prove a minimax property of the ECOLS residual error, and we give an efficient algorithm for computing the coefficients of ECOLS based on constraint generation. We illustrate the algorithmic convergence of FVI with ECOLS in a suite of experiments, and discuss its properties.
منابع مشابه
A Convergent Form of Approximate Policy Iteration
We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with ...
متن کاملBias Correction and Confidence Intervals for Fitted Q-iteration
We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, ...
متن کاملAn Effective Method for Seventh-Order Boundary Value Problems
In this paper, we used the Optimal Homotopy Asymptotic Method (OHAM) to find the approximate solution of seventh order linear and nonlinear boundary value problems. The approximate solution using OHAM is compared with Variational Iteration Method (VIM) and exact solutions, an excellent agreement has been observed. The approximate solution of the equations is obtained in terms of convergent seri...
متن کاملDhage iteration method for PBVPs of nonlinear first order hybrid integro-differential equations
In this paper, author proves the algorithms for the existence as well as the approximation of solutions to a couple of periodic boundary value problems of nonlinear first order ordinary integro-differential equations using operator theoretic techniques in a partially ordered metric space. The main results rely on the Dhage iteration method embodied in the recent hybrid fixed point theorems of D...
متن کاملVariance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes
In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with |S| states, |A| actions, discount factor γ ∈ (0, 1), and rewards in the range [−M,M ], we show how to compute an ǫ-optimal policy, with probability 1− δ in time Õ (( |S||A|+ |S||A| (1− γ) ) log ( M ǫ ) log...
متن کامل